synthetic personae
Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency
Batzner, Jan, Stocker, Volker, Tang, Bingjun, Natarajan, Anusha, Chen, Qinhao, Schmid, Stefan, Kasneci, Gjergji
Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (11 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
Concerns on Bias in Large Language Models when Creating Synthetic Personae
One immense concern relates to the existence of bias in the models, and creating synthetic personae has the potential to aid the investigation of how different forms of bias manifest in LLMs, by introducing a new method of testing. However, the black-box nature of a majority of these models, and their inability to express'opinions' contrary to overall LLM rules or fail-safes, introduces complexities in how to prompt the models to act out specific synthetic personae in various scenarios. This position paper introduces an exploration of a few fundamental questions: What are the benefits and drawbacks of using synthetic personae in HCI research, and how can we customize them beyond the limitations of current LLMs? The perspectives presented in this paper have sprung from the sub-study of a PhD project on Artificial Intelligence and Participatory Design [18]. The sub-study, currently a work in progress, aims at developing a novel method of adversarial testing [6, 13, 21] through the use of contextualized"real-life" vignettes [2, 16] prompted to the interfaces of multiple LLMs to identify potential bias, trying to open up the"black box" from a more qualitative human-computer interaction perspective[10]. 2 BIAS DETECTION IN LLM INTERFACES Research in various sub-fields has shown that human engagement in AI design, development, and evaluation, particularly in a qualitative manner, can ensure a focus on the socio-technical embeddedness of AI [3].
- North America > United States > New York > New York County > New York City (0.06)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
- North America > United States > Virginia (0.04)
- (4 more...)
Exploring Augmentation and Cognitive Strategies for AI based Synthetic Personae
Gonzalez, Rafael Arias, DiPaola, Steve
Large language models (LLMs) hold potential for innovative HCI research, including the creation of synthetic personae. However, their black-box nature and propensity for hallucinations pose challenges. To address these limitations, this position paper advocates for using LLMs as data augmentation systems rather than zero-shot generators. We further propose the development of robust cognitive and memory frameworks to guide LLM responses. Initial explorations suggest that data enrichment, episodic memory, and self-reflection techniques can improve the reliability of synthetic personae and open up new avenues for HCI research.